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Abstract 


We  explore  the  eoneept  of  a  “blaek-box”  stoehastie  system,  and  propose  an  algorithm  for  verifying  proba- 
bilistie  properties  of  sueh  systems  based  on  very  weak  assumptions  regarding  system  dynamies.  The  prop¬ 
erties  are  expressed  using  a  variation  of  PCTL,  the  Probabilistie  Computation  Tree  Logie.  We  present  a 
general  model  of  stoehastie  diserete  event  systems,  whieh  eneompasses  both  diserete-time  and  eontinuous- 
time  proeesses,  and  we  provide  a  semanties  for  PCTL  interpreted  over  this  model.  Our  presentation  is  both 
a  generalization  of  and  an  improvement  over  some  reeent  work  by  Sen  et  al.  on  probabilistie  verifieation  of 
“blaek-box”  systems. 


1  Introduction 


Stochastic  processes  are  used  to  model  phenomena  in  nature  that  involve  an  element  of  chance,  such  as 
the  throwing  of  a  die,  or  are  too  complex  to  fully  capture  in  a  deterministic  fashion,  such  as  the  duration 
of  a  call  in  a  telephone  system.  Certain  classes  of  stochastic  processes  have  been  studied  extensively  in 
the  performance  evaluation  and  model  checking  communities.  Numerous  temporal  logics,  such  as  TCTL 
(Alur  et  al.  1991),  PCTL  (Hansson  and  Jonsson  1994),  and  CSL  (Aziz  et  al.  2000;  Baier  et  al.  2003),  exist 
for  expressing  interesting  properties  of  various  types  of  stochastic  processes.  Model  checking  algorithms 
have  been  developed  for  verifying  properties  of  discrete-time  Markov  chains  (Hansson  and  Jonsson  1994), 
continuous-time  Markov  chains  (Baier  et  al.  2003;  Kwiatkowska  et  al.  2002),  semi -Markov  processes  (In¬ 
fante  Lopez  et  al.  2001),  generalized  semi -Markov  processes  (Alur  et  al.  1991),  and  stochastic  discrete  event 
systems  in  general  (Younes  and  Simmons  2002). 

Given  a  stochastic  process,  we  are  often  interested  in  knowing  if  certain  probabilistic  properties  hold. 
For  a  computer  network,  we  may  want  to  know  that  the  probability  of  exhausting  bandwidth  over  a  com¬ 
munication  link  is  below  some  threshold.  We  can  also  associate  a  deadline  with  a  probabilistic  property, 
for  example  that  a  message  arrives  at  its  destination  within  15  seconds  after  it  is  sent  out  with  probability 
at  least  0.9.  Properties  of  this  type  can  be  verified  using  either  numerical  or  statistical  solution  techniques, 
as  discussed  by  Younes  et  al.  (2004).  Numerical  techniques  provide  highly  accurate  results,  but  rely  on 
strong  assumptions  regarding  the  dynamics  of  the  systems  they  are  used  to  analyze.  Statistical  techniques 
only  require  that  the  dynamics  of  a  system  can  be  simulated,  and  can  therefore  be  used  for  a  larger  class  of 
stochastic  processes.  The  result  produced  by  a  statistical  method  is  only  probabilistic,  however,  and  attaining 
high  accuracy  tends  to  be  costly. 

For  some  systems,  it  may  not  even  be  feasible  to  assume  that  we  can  simulate  their  behavior.  Sen  et  al. 
(2004)  consider  the  model  checking  problem  for  such  “black-box”  systems.  It  is  assumed  of  a  “black¬ 
box”  system  that  it  cannot  be  controlled  to  generate  execution  traces,  or  trajectories,  on  demand  starting 
from  arbitrary  states.  This  is  a  reasonable  assumption  for  a  system  that  has  already  been  deployed,  and  for 
which  we  are  only  given  a  set  of  trajectories  generated  during  actual  execution  of  the  system.  We  are  then 
asked  to  verify  a  probabilistic  property  of  the  system  based  on  the  information  provided  to  us  as  a  fixed  sef 
of  frajecfories.  Sfafisfical  solution  techniques  are  cerfainly  required  fo  solve  fhis  problem.  The  sfalislical 
mefhod  for  probabilisfic  model  checking  proposed  by  Younes  and  Simmons  (2002)  cannof  be  used  for 
verificafion  of  “black-box”  systems,  however,  because  if  depends  on  fhe  ability  fo  generate  frajecfories  on 
demand. 

Sen  ef  al.  (2004)  presenf  an  alternative  solution  mefhod  for  verificafion  of  “black-box”  systems  based 
on  sfafisfical  hypofhesis  testing  wifh  fixed  sample  sizes.  We  improve  upon  fheir  algorifhm  in  several  ways, 
for  example  by  making  sure  fo  always  accepf  fhe  mosf  likely  hypofhesis,  and  we  presenf  a  procedure  for 
verifying  nesfed  probabilisfic  properties,  which  unlike  fhaf  of  Sen  el  al.  acfually  works.  The  differences 
befween  fhe  fwo  competing  approaches  are  discussed  in  defail  towards  fhe  end  of  fhis  paper,  where  we 
also  make  an  efforl  to  explain  why  Sen  el  al.’s  comparison  of  fheir  algorifhm  wifh  fhe  sfafisfical  model 
checking  procedure  used  by  Younes  el  al.  (2004)  is  misguided.  These  fwo  solulion  melhods,  while  bolh 
based  on  sfalislical  hypofhesis  testing,  are  simply  nol  comparable  in  a  meaningful  way  because  fhe  “black¬ 
box”  approach  does  nol  give  any  a  priori  correclness  guarantees. 

We  slarl  by  presenting  a  general  model  of  slochaslic  discrete  evenl  sysfems  lhal  encompasses  bolh 
discrele-lime  and  conlinuous-lime  processes.  We  give  a  clear  definition  of  a  “black-box”  system  in  lerms  of 
Ibis  model,  and  we  define  fhe  synlax  and  semanlics  of  a  logic  for  expressing  properties  of  general  discrete 
evenl  sysfems.  Our  logic  has  essentially  fhe  same  synlax  as  Hansson  and  Jonsson’s  (1994)  PCTL,  and 
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we  call  it  PCTL  as  well  because  it  includes  the  original  version  of  the  logic  as  a  special  case,  but  it  also 
includes  CSL  (without  the  steady-state  operator)  as  defined  by  Baier  et  al.  (2003).  The  algorithm  we  present 
for  verification  of  “black-box”  systems  can  handle  the  full  logic,  including  properties  without  finite  time 
bounds,  although  the  accuracy  of  the  result  for  such  properties  may  very  well  be  poor.  Our  algorithm,  like 
that  of  Sen  et  al.  (2004),  does  in  fact  make  no  guarantees  regarding  accuracy.  Instead  of  respecting  some  a 
priori  bounds  on  the  probability  of  error,  the  algorithm  computes  a  p- value  for  the  result,  which  is  a  measure 
of  confidence.  This  is  really  fhe  besf  we  can  do,  provided  fhaf  we  cannof  generafe  frajecfories  for  fhe  system 
as  we  see  fil  and  instead  are  resfricfed  fo  use  a  predefermined  sef  of  frajecfories. 

2  Stochastic  Discrete  Event  Systems 

A  stochastic  process  is  in  principle  any  process  fhaf  evolves  over  time,  and  whose  evolution  we  can  follow 
and  predicf  in  terms  of  probabilify  (Doob  1942,  1953).  Al  any  poinf  in  time,  a  slochaslic  process  is  said  fo 
occupy  some  slate.  If  we  allempl  fo  observe  fhe  slate  of  a  slochaslic  process  al  a  specific  time,  fhe  oufcome 
of  such  an  observation  is  governed  by  some  probabilify  law. 

A  stochastic  discrete  event  system  is  a  specific  type  of  slochaslic  process  fhaf  can  be  Ihoughl  of  as 
occupying  a  single  slate  for  some  duralion  of  time  until  an  event  causes  an  inslanlaneous  slate  fransilion  fo 
occur.  The  canonical  example  of  such  a  process  is  a  queuing  syslem  wilh  fhe  slate  being  Ihe  number  of  items 
currenlly  in  Ihe  queue.  The  slate  changes  al  Ihe  occurrence  of  an  evenl  representing  Ihe  arrival  or  deparlure 
of  an  item.  We  call  Ihis  a  discrete  event  system  because  Ihe  slate  change  is  discrete  ralher  lhan  continuous 
and  is  caused  by  Ihe  Iriggering  of  an  evenl. 

2.1  Trajectories 

Malhemalically,  we  define  a  slochaslic  process  as  a  family  of  random  variables  X  =  {Xt  |  f  G  T}.  The 
index  sel  T  represenls  time  and  is  typically  Ihe  sel  of  non-negative  integers,  Z*,  for  discrete-lime  slochaslic 
processes  and  Ihe  sel  of  non-negative  real  numbers,  [0,  oo),  for  continuous-lime  slochaslic  processes.  For 
each  1  G  T  we  have  a  random  variable  Xt  representing  Ihe  chance  experimenl  of  observing  Ihe  stochastic 
process  al  time  t.  The  range  of  Xt  is  a  sel  S  of  slates  lhal  Ihe  stochastic  process  can  occupy,  which  can 
be  infinite  or  even  uncounlable.  A  trajectory  or  sample  path  of  a  stochastic  process  is  any  realization 
{xt  ^  S  \  t  ^  T}  of  Ihe  family  of  random  variables  X. 

The  Irajeclory  of  a  stochastic  discrete  evenl  system  is  piecewise  constant  and  can  Iherefore  be  repre¬ 
sented  as  a  sequence  a  =  {(sq;  fo);  wilh  st  ^  S  and  ty  G  T  \  {0}.  Figure  1  plols  pari  of  a 

Irajeclory  for  a  simple  queuing  system.  Lei 

r  0  if  1  =  0 

'“lEr-rij  ifi>»  ’ 

i.e.  Tj  is  Ihe  time  al  which  slate  st  is  entered  and  ty  is  Ihe  duration  of  time  for  which  Ihe  process  remains  in 
Si  before  an  evenl  Iriggers  a  fransilion  to  slate  Sj+i.  A  Irajeclory  a  is  Ihen  a  realization  of  X  wilh  xt  =  st 
for  Tj  <!<!)  +  ti-  According  to  Ihis  definition,  Irajeclories  of  stochastic  discrete  evenl  systems  are 
right-continuous.  A  finite  Irajeclory  is  a  sequence  a  =  {{so,to),  •  •  •  >  (snjOo)}  where  Sn  is  an  absorbing 
slate,  meaning  lhal  no  evenls  can  occur  in  Sn  and  lhal  Xt  =  Sn  for  all  t  >  U. 

Note  lhal  if  Y^o  <  co  for  an  infinite  Irajeclory  a,  which  is  possible  if  T  is  Ihe  non-negative  rational 
or  real  numbers,  Ihen  xt  is  nol  well-defined  for  all  t  ^  T.  For  Ihis  to  happen,  however,  an  infinite  sequence  of 
evenls  musl  occur  in  a  finite  amounl  of  time,  which  is  unrealistic  for  any  physical  system.  Hoel  el  al.  (1972) 
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Figure  1:  A  trajectory  for  a  simple  queuing  system  with  arrival  events  occurring  at  fi,  ^2  and  and  a  departure  event 
occurring  at  The  state  of  the  system  represents  the  number  of  items  in  the  queue. 


use  the  term  explosive  to  deseribe  proeesses  for  whieh  sueh  sequenees  ean  oeeur  with  non-zero  probability. 
It  is  eommon  to  assume  time  divergenee  for  infinite  trajeetories  of  real-time  systems  (ef.  Alur  and  Dill  1994), 
i.e.  that  the  systems  are  non-explosive,  and  most  finite-state  systems  satisfy  this  property  by  default. 

2.2  Probability  Space  and  “Black-Box”  Probabilistic  Systems 

A  prefix  of  a  trajeetory  a  =  {{sq,  to),  {si,ti), . . .}  is  asequenee  a<r  =  •  •  •  >  '''i* 

for  all  i  <  k,  J2i=o  i  <  k,  and  <  tk-  Let  Path{a<r)  denote  the  set  of  trajeetories 

with  common  prefix  cj<t-.  This  set  must  be  measurable  for  probabilistic  model  checking  to  make  sense, 
and  we  assume  that  a  probability  measure  p  over  the  set  of  trajectories  with  common  prefix  exists.  This 
is  hardly  a  severe  restriction  as  such  a  measure  can  be  defined  for  systems  of  practical  interest,  although 
the  precise  definition  thereof  is  not  required  for  the  approach  to  probabilistic  model  checking  considered  in 
this  paper.  In  fact,  the  lack  of  knowledge  of  the  probability  measure  over  sets  of  trajectories  can  be  seen 
as  the  defining  characteristic  of  a  “black-box”  probabilistic  system.  If  we  had  complete  knowledge  of  this 
probability  measure,  then  the  system  under  consideration  would  not  be  a  black  box  to  us.  This  leads  us  to 
make  the  following  definition. 

Definition  1  (“Black-box”  probabilistic  system),  A  stochastic  discrete  event  system  for  which  the  proba¬ 
bility  measure  p  over  sets  of  trajectories  with  common  prefix  is  unknown  and  cannot  even  be  sampled  from 
is  called  a  “black-box”  probabilistic  system. 

A  measurable  space  is  a  set  D  with  a  fi-algebra  of  subsets  of  D  (Halmos  1950).  A  probability  space 
is  a  measurable  space  (D,  Fq)  and  a  probability  measure  p  that  assigns  a  value  in  the  interval  [0, 1]  to  the 
elements  of  Fq,  with  |u(0)  =  0,  p{^)  =  1,  and  p{E)  =  Y^iP{Ei)  if  Ei,E2, ...  are  countably  many 
pairwise  disjoint  sets  in  Fn  and  E  is  their  union.  When  we  say  that  a  set  D  must  be  measurable,  we  really 
mean  that  there  must  be  a  cj-algebra  for  the  set.  The  elements  of  this  fi-algebra  are  the  measurable  subsets 
of  D. 

A  stochastic  discrete  event  system  is  measurable  if  the  sets  S  and  T  are  measurable.  We  can  show 
this  by  defining  a  a-algebra  over  the  set  of  trajectories  with  common  prefix  a<r  =  {(so;  ^o);  •  •  •  >  (sfc)  tk)}^ 
denoted  Path{a<r),  as  follows.  Let  Fs  be  a  cj-algebra  over  the  state  space  S,  and  let  Ft  be  a  fi-algebra 
over  the  index  set  T  of  the  stochastic  process.  Such  a-algebras  exist  if  S  and  T  are  measurable  sets,  which 
by  assumption  they  are.  Then  C{a<r:Ik,Sk+i,  ■  ■  ■  ,In-i,Sn),  with  Si  G  Fs  and  li  G  Ft,  denotes  the 
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set  of  trajectories  a  =  {(sg,  fg),  (s'^,  f'^), . . .  }  such  that  s'  =  Sj  for  i  <  k,  s'^  e  Si  for  k  <  i  <  n, 
f'  =  ti  for  i  <  k,  t'f.  >  tk,  and  t'  G  /j  for  /c  <  i  <  n.  In  other  words,  C{a<r,  h,  Sk+i,  ■  ■  ■ ,  In-i,Sn) 
is  a  subset  of  Path{a<r)-  The  sets  C{a<r,  Iki  S^+i,  ■  ■  ■ ,  In-i,  Sn)  are  the  elements  of  a  cr-algebra  over 
the  set  Path{a<r)  with  set  operations  applied  element-wise,  for  example  C{(T<r,  Ik,  Sk+i,  •  •  • ,  In-i,  Sn)  U 


C{o'<T,  I'k,  'S'fc+i)  •  •  •  )  I'n-l^  S'n)—  C{a<r,  Ik  U  I'j.,  S^+l  U  •  •  •  ,  In-1  U  I'n-i,  Sn  U  S'n). 


3  Properties  of  Stochastic  Discrete  Event  Systems 

A  stochastic  discrete  event  system  can  be  specified  as  a  triple  {S,  T,  p),  where  5  is  a  set  of  states,  T  is  a  time 
domain,  and  is  a  probability  measure  over  sets  of  trajectories  with  common  prefix.  We  fypically  assume 
a  facfored  represenfafion  of  S,  wifh  a  sef  of  sfafe  variables  SV  and  a  value  assignmenf  funcfion  V{s,x) 
providing  fhe  value  of  x  G  SV  in  sfafe  s.  The  domain  of  x  is  fhe  sef  Dx  =  UseS^('^’^)  possible 
values  fhaf  x  can  lake  on.  We  define  fhe  synlax  of  PCTL  for  a  factored  stochastic  discrele  evenl  system 
M  =  (5,r,//,57,F)  as 

$  ::=  X  ~  n  I  I  ^  A  ^  I  [X^  |  , 

where  x  G  SV ,  v  G  Dx,  ~  G  {<,=,>},  0  G  [0,1],  M  G  {<,>},  and  I  C  T.  Additional  PCTL 
formulae  can  be  derived  in  fhe  usual  way.  For  example,  _L  =  (x  =  u)  A  -■(x  =  v)  for  some  x  G  SV  and 
V  G  Dx,  T  =  -._L,  V  ^'  =  A  ^  V  [^U^]  =  V(^g  [<1>  U'^  ^'j,  and 

D<  e  [p\  =  -^P>  e  [p\  ■ 

The  slandard  logic  operators  have  Iheir  usual  meaning.  g  [(f\  asserls  lhal  fhe  probabilily  measure 
over  fhe  sef  of  Irajeclories  satisfying  fhe  palh  formula  (/?  is  relaled  lo  9  according  to  ixi.  Palh  formulae  are 
conslrucfed  using  fhe  lemporal  palh  operators  (“nexl”)  and  (“until”).  The  palh  formula  X^  <I>  asserls 
lhal  fhe  nexl  sfafe  Iransilion  occurs  f  G  /  time  unils  info  fhe  fulure  and  lhal  $  holds  in  fhe  nexl  sfafe,  while 
asserls  lhal  'F  becomes  Irue  t  G  /  time  unils  into  Ihe  fulure  while  <h  holds  continuously  prior  to  t. 
The  validity  of  a  PCTL  formula,  relative  to  a  factored  stochastic  discrete  evenl  system  M.,  is  defined  in 
terms  of  a  satisfaction  relation  \=_m  belween  Irajeclory  prefixes  and  PCTL  formulae: 

{{so,to),...,{sk,tk)}  iff  V{sk,x) 

(x<T  \=M  “■'h  iff  clt  'h 

<7<r  \=M  ^  A  ty  iff  {a<r  \=M  'h)  A  {a<r  \=M  T') 

(7<t  |=At  'Ptxie  [p]  iff  k^{{o-  G  Path{a<r)  \  cr,  r  \=m  p})  tx  9 

The  above  definition  relies  on  a  satisfaction  relation  a,  r  \=m  P  such  lhal  (cr,  r,  ip)  G  \=m  iff  ^  satisfies  p 
slarling  al  time  r.  This  satisfaction  relation  for  palh  formulae  is  defined  as  follows: 

cr,  r  \=M  “f*  iff  3/c  G  N.((Tfc_i  <  r)  A  (r  <  Tk)  A  (Tfc  -  r  G  /)  A  ^)) 

'I'  iff  G  I.{{a<r+t  AVf'  G  T.((t'  <  t)  ^  Na4  ^))) 

Note  lhal  Ihe  semantics  of  $  ty  requires  lhal  <1>  holds  continuously,  i.e.  al  all  time  poinls,  along  a 
Irajeclory  until  ty  is  satisfied.  This  is  consislenl  wilh  Ihe  semantics  of  lime-bounded  until  for  TCTL  defined 
by  Alur  el  al.  (1991).  Depending  on  Ihe  probability  measure  p,  <I>  may  very  well  hold  immediately  al  Ihe 
enlry  of  a  slate  s  and  also  immediately  after  a  Iransilion  from  s  to  s',  bul  still  nol  hold  continuously  while 
Ihe  system  remains  in  s.  Conversely,  ty  may  hold  al  some  poinl  in  time  while  Ihe  system  remains  in  s,  and 
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not  hold  immediately  upon  entry  to  s  nor  immediately  after  a  transition  from  s  to  s'.  It  is  therefore  not 
suffieient,  exeept  in  speeial  eases,  to  verify  <h  and  'h  at  diserete  points  along  a  trajeetory. 

If  <h  and  'h  are  both  free  of  any  probabilistie  operators,  then  it  is  always  suffieient  to  verify  the  two 
formulae  onee  in  eaeh  state  along  a  trajeetory  in  order  to  verify  <h  'h.  The  same  holds  true  if 

(2)  iJ,{Path{{{so,to), (sfc,4)}))  =  fJ.{Path{{{sk,0)})) 

for  all  trajeetory  prefixes  {{so,to), . . . ,  (s^,  ffc)}-  This  is  fhe  ease  if  Ad  is  a  Markov  ehain  as  (2)  simply  is 
a  formulalion  of  fhe  Markov  properly.  Our  semanlies  for  PCTL  inlerprefed  over  general  sloehaslie  diserele 
evenl  syslems  Iherefore  eoineides  wilh  fhe  semanlies  for  PCTL  inlerprefed  over  diserele-lime  Markov  ehains 
(Hansson  and  Jonsson  1994)  and  CSL  interpreted  over  eonlinuous-lime  Markov  ehains  (Baier  el  al.  2003), 
provided  we  ehoose  Ihe  lime  domain  T  appropriately. 

A  PCTL  model  eheeking  problem  is  lypieally  speeilied  as  a  Iriple  (Ad,  s,  4>),  wilh  Ihe  problem  being  lo 
verify  if  $  holds  for  Ad  provided  lhal  exeeulion  slarls  in  slate  s,  i.e.  {(s,  0)}  \=m  We  often  use  s  ^  $ 
as  a  shorl  form  for  Ihe  laller,  leaving  oul  Ad  when  il  is  elear  from  Ihe  eonlexl  whieh  system  is  involved  in 
Ihe  model  eheeking  problem. 

4  Statistical  Model  Checking  for  “Black-Box”  Stochastic  Systems 

We  refer  lo  a  sloehaslie  diserete  evenl  system  Ad  as  a  “blaek-box”  system  if  we  laek  an  exael  definition  of 
Ihe  probabilily  measure  ^  over  sels  of  Irajeelories  of  Ad.  We  assume  lhal  we  eannol  even  sample  Irajeelories 
aeeording  lo  fj,  as  earlier  slated  in  Definition  1 .  Thus,  in  order  lo  solve  a  model  eheeking  problem  s  ^  for 
a  “blaek-box”  system  Ad,  we  musl  rely  on  an  external  souree  lo  provide  us  wilh  a  sel  of  Irajeelories  for  Ad 
lhal  slarl  in  slate  s.  We  assume  lhal  Irajeelories  eannol  be  generated  on  demand,  bul  lhal  we  are  provided 
wilh  a  finite  sel  of  n  Irajeelories.  This  sample  of  size  n  musl  of  eourse  be  represenlalive  of  Ihe  probabilily 
measure  fi{Path{{{s,0)})),  and  we  musl  Irusl  our  external  souree  lo  provide  us  wilh  a  represenlalive  sel 
of  Irajeelories.  We  furlher  assume  lhal  we  are  only  provided  wilh  truncated  Irajeelories,  beeause  infinite 
Irajeelories  would  require  infinite  memory  lo  store. 

We  will  use  slalislieal  hypolhesis  testing  to  solve  a  model  eheeking  problem  s  ^  $  given  a  sample  of  n 
Irunealed  Irajeelories.  Sinee  we  rely  on  slalislieal  leehniques,  we  will  lypieally  nol  know  wilh  eerlainly  if  Ihe 
resull  we  produee  is  eorreel.  The  melhod  we  presenl  below  eompules  a  p-value  for  a  model  eheeking  resull, 
whieh  is  a  value  in  Ihe  interval  [0, 1]  wilh  values  eloser  to  0  representing  higher  eonfidenee  in  Ihe  resull  and  a 
p-value  of  0  representing  eerlainly  (Hogg  and  Craig  1978,  pp.  255-256).  We  slarl  by  assuming  lhal  is  free 
of  nested  probabilistie  operators.  Later  on,  we  eonsider  PCTL  formulae  wilh  nested  probabilistie  operators, 
whieh  as  il  lurns  oul  eannol  be  handled  in  a  meaningful  way  wilhoul  making  ralher  slrong  assumptions 
regarding  Ihe  dynamies  of  Ihe  “blaek-box”  system. 

4.1  PCTL  without  Nested  Probabilistic  Operators 

Given  a  slate  s,  verifiealion  of  a  PCTL  formula  x  ~  r;  is  Irivial.  We  eonsider  Ihe  remaining  Ihree  eases  in 
more  delail,  slarling  wilh  Ihe  probabilistie  operator  V^e  [•]■  Reeall  lhal  Ihe  objeelive  is  to  produee  a  Boolean 
resull  annolaled  wilh  a  p-value. 

4.1.1  Probabilistic  Operator 

Consider  Ihe  problem  of  verifying  Ihe  PCTL  formula  g  [p]  in  slate  s  of  a  sloehaslie  diserete  evenl  system 
M.  Lei  Xi  be  a  random  variable  representing  Ihe  verifiealion  of  Ihe  palh  formula  ip  over  a  Irajeelory  for 
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A4  drawn  according  to  the  probability  measure  fi(Path({(s,0}})).  If  we  choose  X,  =  1  to  represent 
the  fact  that  if  holds  over  a  random  trajectory,  and  X*  =  0  to  represent  the  opposite  fact,  then  X*  is  a 
Bernoulli  variate  with  parameter  p  =  /i({cT  G  Path{{{s,Q)})  |  a, 0  ^  ip}),  i.e.  Pr[Xj  =  1]  =  p  and 
Pr[Xi  =0]  =  1  —  p.  In  order  to  verify  V^e  [f],  we  can  make  observations  of  X,  and  use  statistical 
hypothesis  testing  to  determine  if  p  co  6*  is  likely  to  hold.  An  observation  of  Xj,  denoted  x*,  is  the  verification 
of  ip  over  a  specific  frajecfory  at.  If  ai  satisfies  fhe  pafh  formula  p,  fhen  xi  =  1,  ofherwise  Xi  =  0. 

In  our  case,  we  are  given  n  fruncafed  frajecfories  for  a  “black-box”  system  fhaf  we  can  use  fo  generafe 
observations  of  Xj.  Each  observation  is  obfained  by  verifying  fhe  pafh  formula  p  over  one  of  fhe  fruncafed 
frajecfories.  This  is  sfraighfforward  given  a  fruncafed  frajecfory  {(sq,  Iq),  . . . ,  (sfc_i,  ffc_i),  s^},  provided 
fhaf  p  does  nof  confain  any  probabilisfic  operafors.  For  p  =  X^  <I>,  we  jusf  check  if  to  ^  I  and  si  |=  <I>. 
For  p  =  <I>  'k,  we  fraverse  fhe  frajecfory  unfil  we  find  a  sfafe  Si  such  fhaf  one  of  fhe  following  condifions 

holds,  wifh  Ti  defined  as  in  (1)  fo  be  fhe  time  af  which  sfafe  s*  is  entered: 

1-  {si  \=  “■‘h)  A  {{Ti  ^  /)  V  {si  ^  -■'k)) 

2.  (T,  G  /)  A  {si  [= 

3.  ((ri,Ti+i)n//0)  A(si  A(si 

In  fhe  firsl  case,  ^  does  nof  hold  over  fhe  frajecfory,  while  in  fhe  second  fwo  cases  fhe  lime-bounded 
until  formula  does  hold.  Note  fhaf  we  may  nof  always  be  able  fo  defermine  fhe  value  of  p  over  all  frajecfories 
because  fhe  frajecfories  fhaf  are  provided  fo  us  are  assumed  fo  be  fruncafed. 

We  consider  fhe  case  V>  g  [p]  in  delail,  noling  fhaf  'P<  g  [p]  can  be  handled  in  fhe  same  way  simply  by 
reversing  fhe  value  of  each  observation.  We  wanf  fo  fesl  fhe  hypolhesis  Hq  :  p  >  9  againsl  fhe  alternative 
hypofhesis  Ffi  :  p  <  0  by  using  fhe  n  observations  xi, . . . , Xn  of  fhe  Bernoulli  variates  Xi, . . . ,  X„.  To 
do  so,  we  specify  a  consfanf  c.  If  greafer  fhan  c,  fhen  hypolhesis  Hq  is  accepfed,  i.e.  'P>  g  [p]  is 

delermined  fo  hold.  Ofherwise,  if  fhe  given  sum  is  al  mosl  c,  fhen  hypolhesis  Hi  is  accepted  meaning  fhaf 
V>  g  [p]  is  determined  nof  fo  hold.  The  consfanf  c  should  be  chosen  so  fhaf  if  becomes  roughly  equally  likely 
fo  accepl  Ho  as  Hi  if  p  equals  6.  The  pair  (n,  c)  is  typically  called  a  single  sampling  plan  in  fhe  quality 
confrol  lileralure  (Monlgomery  1991). 

The  probabilily  dislribulion  of  a  sum  of  n  Bernoulli  variates  wifh  parameter  p  is  a  binomial  dislribulion 
wifh  parameters  n  and  p,  denoted  B{n,p).  The  probability  of  being  al  mosl  c  is  Iherefore  given 

by  fhe  cumulative  dislribulion  function  for  B{n,p): 

(3)  F(c;n,p)  =  ^  -p)'"-* 

i=0 

Thus,  wifh  probability  F{c;n,p)  we  accepl  hypolhesis  Hi  using  a  single  sampling  plan  {n,c),  and  con- 
sequenlly  hypolhesis  Ho  is  accepted  wifh  probability  1  —  F{c;n,p)  by  fhe  same  sampling  plan.  Ideally, 
we  should  choose  c  such  fhaf  F(c;  n,  9)  =  0.5,  bul  if  is  nof  always  possible  fo  alfain  equality  because  fhe 
binomial  dislribulion  is  a  discrete  dislribulion.  The  besl  we  can  do  is  fo  choose  c  such  fhaf  \F{c;  n,  9)  —  0.5| 
is  minimized.  We  can  readily  compute  fhe  desired  c  using  (3). 

We  now  have  a  way  fo  decide  whelher  fo  accepl  or  rejecl  fhe  hypolhesis  fhaf  V>  g  [p]  holds,  bul  we  also 
wanf  fo  reporf  a  value  rellecling  fhe  confidence  in  our  decision.  For  Ihis  purpose,  we  compule  fhe  p-value 
for  a  decision.  The  p-value  is  defined  as  fhe  probabilily  of  fhe  sum  of  observations  being  al  leasl  as  exlreme 
as  fhe  one  obfained  provided  fhaf  fhe  hypolhesis  fhaf  was  nof  accepted  holds.  The  p-value  for  accepting 
Ho  when  Xi  =  dis>  PrEr=i  ^  d  \  P  <  &]  <  P{n  —  d;  n,  1  —  0)  =  1  —  F{d  —  1;  n,  9),  while 
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Figure  2:  A  simple  two-state  continuous-time  Markov  chain. 


the  p-value  for  accepting  Hi  is  Xi  <  d  \  p  >  6]  <  F{d]n,9).  The  following  theorem  provides 

justification  for  our  choice  of  the  constant  c. 

Theorem  1  (Minimization  of  p-value).  By  choosing  c  to  minimize  \F{c;  n,  9)  —  0.5|  when  testing  Hq  :  p  > 
9  against  Hi  :  p  <  9  using  a  single  sampling  plan  (n,  c),  the  hypothesis  with  the  lowest  p-value  is  always 
accepted. 

Proof.  Hypothesis  Hi  h  only  accepted  if  d  <  c,  which  means  that  the  p-value  for  Hi  under  these  circum¬ 
stances  is  at  most  F(c;  n,  9).  The  p-value  for  Hq  if  d  <  c  would  be  at  least  1  —  F(c  —  1;  n,  d).  We  know  that 
F{c—l]n,9)  <  F(c;  n,  d)  and  by  assumption  that  I  F(c  — 1;  n,  d) —0.5 1  >  |F(c;  n,  d) —0.5 1.  It  follows  that 
F(c;  n,  d)  <  1  —  F{c  —  1;  n,  d)  as  required.  For  d  >  c,  the  p-value  for  acceptance  of  Hi  would  be  at  least 
F{c  +  1;  n,  d).  The  p-value  for  acceptance  of  Hq  when  d  >  c,  on  the  other  hand,  is  at  most  1  —  F(c;  n,  9). 
We  know  that  F(c+ 1;  n,  d)  >  F(c;  n,  d)  and  by  assumption  that  |F(c+ 1;  n,  d)  —  0.5|  >  |F(c;  n,  d)  —  0.5|. 
Consequently,  1  —  F{c]  n,  9)  <  F(c+ 1;  n,  9)  and  our  choice  of  c  ensures  that  the  hypothesis  with  the  lowest 
p-value  is  always  accepted.  □ 

In  the  analysis  so  far  we  have  been  assuming  that  the  value  of  (p  can  be  determined  over  all  n  truncated 
trajectories  that  we  are  given.  Now,  consider  the  case  when  we  are  unable  to  verify  the  path  formula  p 
over  some  of  the  n  truncated  trajectories.  This  would  happen  if  we  are  verifying  $  'F  over  a  trajectory 
that  has  been  truncated  before  either  V  'F  is  satisfied  or  time  exceeds  all  values  in  I.  We  cannof  simply 
ignore  such  frajecfories:  if  is  assumed  fhaf  fhe  entire  sef  of  n  frajecfories  is  represenfafive  of  fhe  measure  p, 
buf  fhe  subsef  of  fruncafed  frajecfories  for  which  we  can  defermine  fhe  value  of  ip  is  nof  guaranfeed  fo  be  a 
represenfafive  sample  for  fhis  measure. 

For  example,  consider  fhe  problem  of  verifying  fhe  PCTL  formula  <I>  =  V>q,q  [T  ^P,ioo]  l]  in  a 
sfafe  safisfying  a;=0  for  a  “black-box”  sysfem  fhaf  in  realify  is  fhe  confinuous-fime  Markov  chain  shown 
in  Figure  2.  The  probabilify  measure  of  frajecfories  sfarfing  in  sfafe  x=0  and  safisfying  T  Z^P.ioo] 

1  —  e~^  0.63  for  fhis  sysfem,  so  fhe  PCTL  formula  does  nof  hold,  buf  we  would  of  course  nof  know  fhis 

unless  we  had  access  fo  fhe  model.  Assume  fhaf  we  are  provided  wifh  a  sef  of  100  fruncafed  frajecfories 
for  fhe  sysfem,  and  fhaf  all  frajecfories  have  been  fruncafed  before  time  50.  Some  of  fhese  frajecfories,  on 
average  roughly  39  in  every  100,  will  satisfy  fhe  pafh  formula  T  while  fhe  remaining  fruncafed 

frajecfories  will  nof  confain  sufficienl  information  for  us  fo  defermine  fhe  validify  of  fhe  pafh  formula  over 
fhese  frajecfories.  An  analysis  based  solely  on  fhe  frajecfories  over  which  fhe  pafh  formula  can  be  decisively 
verified  would  be  severely  biased.  If  fhe  number  of  posifive  observations  is  exacfly  39,  wifh  61  undefermined 
observations,  we  would  wrongly  conclude  fhaf  <1>  holds  wifh  p-value  1  —  F(38;  39, 0.9)  0.0164,  which 

implies  a  fairly  high  confidence  in  fhe  resulf. 

Lef  n'  be  fhe  number  of  observafions  whose  value  we  can  defermine  and  lef  d’  be  fhe  sum  of  fhese  n' 
observations.  We  fhen  know  fhaf  fhe  sum  of  all  observafions,  d,  is  af  leasf  d'  and  af  mosf  d'  +  n  —  n',  i.e. 
d  G  [d' ,  d'  +  n  —  n'].  If  d'  >  c,  fhen  hypofhesis  Hq  can  be  safely  accepfed.  Insfead  of  a  single  p-value,  we 
associate  an  interval  of  possible  p- values  wifh  fhe  resulf:  [F{n'—d']  n,  l—9),F{n—d']  n,  1— d)].  Conversely, 
if  d'  +  n  —  n'  <  c,  fhen  hypofhesis  Hi  can  be  accepfed  wifh  p-value  in  fhe  interval  [F{d' ;  n,  9) ,  F{d'  +  n  — 
n';n,6»)].  If,  however,  d'  <  c  and  d'  +  n  —  n'  >  c,  fhen  if  is  nof  clear  which  hypofhesis  should  be  accepted. 
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We  could  in  this  case  say  that  we  do  not  have  enough  information  to  make  an  informed  choice.  Alternatively, 
we  could  accept  one  of  the  hypotheses  with  its  associated  p-value  interval.  We  prefer  to  always  make  some 
choice,  and  we  recommend  choosing  Hq  if  F{n  —  d';  n,l  —  6)  <  F{d'  +  n  —  n';  n,  9)  and  Hi  otherwise. 
This  strategy  minimizes  the  maximum  possible  p-value.  Alternatively,  we  could  minimize  the  minimum 
possible  p-value  by  instead  choosing  Hq  if  F{n'  —  d'-,n,l  —  9)  <  F{d;n,9)  and  Hi  otherwise.  Note 
that  this  way  of  treating  truncated  trajectories  makes  our  approach  work  even  for  unbounded  until  formulae 
^  'k,  although  we  would  typically  expect  the  result  to  be  highly  uncertain  for  such  formulae. 

Consider  the  same  problem  as  before,  with  39  positive  and  61  undetermined  observations  and  assuming 
the  system  behaves  like  the  Markov  chain  shown  in  Figure  2.  The  p-value  interval  for  accepting  the  PCTL 
formula  =  V>o.9  x=l]  as  true  is  [F(0;  100, 0.1),  F(61, 100, 0.1)]  [2.65  •  10-^  1  -  3.77  • 

10“^®].  For  the  opposite  decision,  we  get  the  p-value  interval  [F(39;  100, 0.9),  F(100;  100, 0.9)]  [1.59  • 

10“^®,  1].  Both  intervals  are  almost  equally  uninformative,  so  no  matter  what  decision  we  make,  we  will 
have  a  high  uncertainty  in  the  result.  We  would  accept  as  true  if  we  prefer  to  minimize  the  maximum 
possible  p-value,  and  we  would  reject  as  false  if  we  instead  prefer  to  minimize  the  minimum  possible 
p-value,  but  in  both  cases  we  have  a  maximum  p-value  well  above  0.5.  This  is  in  sharp  contrast  to  the  faulty 
analysis  suggested  earlier,  which  lead  to  an  acceptance  of  <k  as  true  with  a  low  p- value. 

4.1.2  Negation 

To  verify  -i<k,  we  first  verify  <k.  If  we  conclude  that  <k  has  a  certain  truth-value  with  p-value  pv,  then  we 
conclude  that  has  the  opposite  truth-value  with  the  same  p- value.  To  motivate  this,  consider  the  case 
-'F>g  [p].  To  verify  V>g  [p],  we  test  the  hypothesis  Hq  :  p  >  9  against  Hi  :  p  <  9  as  stated  above. 
Note,  however,  that  -iF>  g  [p]  =  F<  g  [p] ,  which  could  be  posed  as  the  problem  of  testing  the  hypothesis 
h'q-.p  <  9  against  H[  :  p  >  9.  Since  Hq  =  Hi  and  H[  =  Hq,  we  can  simply  negate  the  result  of  verifying 
F>  g  [p]  while  maintaining  the  same  p-value. 

4.1.3  Conjunction 

For  a  conjunction  <k  A  'k,  we  have  to  consider  four  cases.  First,  if  we  verify  $  to  hold  with  p-value  pv^  and 
'k  to  hold  with  p-value  pvq,,  then  we  conclude  that  <k  A  'k  holds  with  p-value  max(p?;$,  p?;,j,).  Second,  if 
we  verify  $  not  to  hold  with  p-value  pv,  while  verifying  that  'k  holds,  then  we  conclude  that  <k  A  'k  does 
not  hold  with  p-value  pv.  The  third  case  is  analogous  to  the  second  with  <k  and  'k  interchanged.  Finally,  if 
we  verify  <k  not  to  hold  with  p-value  pv^  and  'k  not  to  hold  with  p-value  p?;,j,,  then  we  conclude  that  <k  A  'k 
does  not  hold  with  p-value  m.m{pv^,pvii,). 

Before  deriving  the  given  expressions  for  the  p-values  associated  with  the  verification  result  of  a  con¬ 
junction,  let  us  give  an  intuitive  justification.  In  order  for  <k  A  'k  to  hold,  both  <k  and  'k  must  hold,  so  we 
cannot  be  anymore  confident  in  the  result  for  <k  A  'k  than  we  are  in  the  result  for  the  individual  conjuncts, 
thus  the  maximum  in  the  first  case.  To  conclude  that  $  A  'k  does  not  hold,  however,  we  only  need  to  be 
convinced  that  one  of  the  conjuncts  does  not  hold.  In  case  we  think  exactly  one  of  the  conjuncts  holds, 
then  the  result  for  the  conjunction  will  be  based  solely  on  this  conviction  and  the  p-value  for  the  conjunct 
we  think  holds  should  not  matter.  This  covers  the  second  and  third  cases.  In  the  fourth  case,  we  have  two 
sources  (not  necessarily  independent)  telling  us  that  the  conjunction  is  false.  We  therefore  have  no  reason 
to  be  less  confident  in  the  result  for  the  conjunction  than  in  the  result  for  each  of  the  conjuncts,  hence  the 
minimum  in  this  case. 

For  a  mathematical  derivation  of  the  given  expressions,  we  consider  the  formula  F>  g^  [pi]  A  V>  02  [^2]- 
Let  di  denote  the  number  of  trajectories  that  satisfy  p*.  Provided  we  accept  the  conjunction  as  true,  which 


means  we  aeeept  eaeh  eonjunet  as  true,  the  p- value  for  this  result  is 


(4)  Pr[^  Xf )  >  di  A  Xf'>  >  (i2  bi  <  01  V  p2  <  ^2]  • 

i=\  i=l 

To  eompute  this  p-value,  we  eonsider  the  three  ways  in  whieh  pi  <  61  \/  p2  <  62  can  be  satisfied  (ef.  Sen 
et  al.  2004).  We  know  from  elementary  probability  theory  that 

(5)  Pr[yl  n  S]  <  min(Pr[yl],  Pr[i?]) 

for  arbitrary  events  A  and  B.  From  this  faet,  and  assuming  that  pvj^  is  the  p-value  assoeiated  with  the 
veriheation  result  for  'P>  g.  [pj],  we  derive  the  following: 

Pi'Er=i  >di  A  Y17=i  >  (^2  I  Pi  <  01  AP2  <  02]  =  m.m{pvi,pv2) 

2-  Pi'[Er=i  >di  A  Y17=i  >  ^2  I  Pi  <  01  Ap2  >  02]  =  min(p?;i,  1)  =  pv^ 

3-  PrEr=i  >  c^i  A  Y17=i  >  ^2  1  Pi  >  01  Ap2  <  02]  =  min(l,p?;2)  =  P^2 

We  take  the  maximum  over  these  three  eases  to  obtain  a  bound  for  (4),  whieh  gives  us  max(p?;^,  p?;2). 

For  the  same  formula,  but  now  assuming  we  have  verified  bofh  eonjunets  to  be  false,  we  eompute  the 
p-value  as 

n  n 

(6)  Pr[^  Xf )  <  di  A  ^  )  <  d2  1  Pi  >  01  A  p2  >  02]  . 

i=\  i=l 

It  follows  immediately  from  (5)  that  min(p?;^,  p?;2)  is  a  bound  for  (6),  whieh  is  the  desired  result. 

4.2  PCTL  with  Nested  Probabilistic  Operators 

If  we  allow  nested  probabilistie  operators,  PCTL  model  eheeking  for  “blaek-box”  stoehastie  diserete  event 
systems  beeomes  mueh  harder.  Consider  the  formula  V>e  [T  ^P400]  -p^^,  [(^]j  jn  order  to  verify  this 
formula,  we  must  test  if  V>0i  [p]  holds  at  some  time  t  G  [0, 100]  along  the  set  of  trajeetories  that  we  are 
given.  Unless  the  time  domain  T  is  sueh  that  there  is  a  finite  number  of  time  points  in  a  finite  interval,  then 
we  potentially  have  to  verify  V>  e'  [p]  at  an  infinite  or  even  uneountable  number  of  points  along  a  trajeetory, 
whieh  elearly  is  infeasible.  Even  if  T  =  Z*,  so  that  we  only  have  to  verify  nested  probabilistie  formulae  at 
a  finite  number  of  points,  we  still  have  to  take  the  entire  prefix  of  the  trajeetory  into  aeeount  at  eaeh  time 
point.  We  are  given  a  fixed  set  of  trajeetories,  and  we  ean  only  use  the  subset  of  trajeetories  with  a  matehing 
prefix  to  verify  a  nested  probabilistie  formula.  This  means  that  we  will  have  very  few  trajeetories  available 
to  use  for  the  veriheation  of  nested  probabilistie  formulae,  most  likely  only  one  if  the  prehx  is  long,  in  whieh 
ease  the  uneertainty  in  the  result  will  be  overwhelming. 

Only  if  we  assume  that  the  “blaek-box”  system  is  a  Markov  ehain,  whieh  is  a  rather  strong  assumption 
to  make,  ean  we  hope  to  have  a  signiheant  number  of  trajeetories  available  for  the  veriheation  of  nested 
probabilistie  formulae.  This  is  beeause,  under  the  Markov  assumption,  we  only  have  to  take  the  last  state 
along  a  trajeetory  prehx  into  eonsideration.  Consequently,  any  suffix  of  a  truneated  trajeetory  starting  at 
a  speeihe  state  s,  in  the  set  provided  to  us  by  an  external  souree,  ean  be  regarded  as  representative  of  the 
probability  measure  ;u({(s,0)}). 
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Another  complicating  factor  in  the  verification  of  V>  g  [p],  where  contains  nested  probabilistic  opera¬ 
tors,  is  that  we  cannot  verify  tp  over  trajectories  without  some  uncertainty  in  the  result.  This  means  that  we 
do  no  longer  obtain  observations  of  the  random  variables  Xi  as  defined  above,  buf  insfead  we  observe  some 
ofher  random  variables  Yi  wifh  quife  differenl  disfribufions.  We  accepf  V>  g  [p]  as  frue  if  Yi  >  c  for 
some  consfanf  c,  and  we  rejecf  fhe  same  formula  as  false  ofherwise.  We  can  choose  c  as  previously,  buf  whaf 
is  fhe  p- value  of  fhe  decision? 

To  compufe  a  p-value  for  nested  verification  we  assume  fhaf  Pr[li  =  ^  \  a,T  \=  p]  <  a  and  Pr[yi  = 

I  \  a,T  p]  <  fi.  We  can  make  fhis  assumption  if  we  infroduce  indifference  regions  in  fhe  verification 

of  probabilistic  formulae  fhaf  are  pari  of  p.  Under  fhe  given  assumplion,  we  can  use  fhe  lolal  probability 
formula  lo  derive  bounds  for  Pr[yj  =  1]:  p(l  —  a)  <  Pr[yj  =  1]  <  1  —  (1  —  p)(l  —  /?).  The  p-value  for 
accepting  V>g  [p]  as  frue  when  fhe  sum  of  fhe  observations  is  d  is  Pr[^”^^  Yi  >  d  \  p  <  9]  <  F{n  — 
d]  n,  —  —  (])).  The  p- value  for  fhe  opposite  decision  is  Pr[^”^j^  Yi  <  d  \  p>  6]  <  F{d]  n,  6{l  —  a)). 

Since  F{d;n,p)  increases  as  p  decreases,  we  see  fhaf  fhe  p-value  increases  as  fhe  error  bounds  a  and  (3 
increase,  which  makes  perfecl  sense.  While  we  said  fhaf  c  can  be  chosen  as  previously.  Ibis  choice  does  no 
longer  guaranlee  fhaf  fhe  hypofhesis  wifh  fhe  lowesl  p-value  is  accepled.  To  minimize  fhe  p-value  of  fhe 
resull,  we  can  simply  compute  fhe  p- values  of  fhe  Iwo  hypofheses  and  accepf  fhe  hypofhesis  wifh  fhe  lowesl 
p-value. 

We  can  lei  fhe  user  specify  a  parameler  Jq  thal  conlrols  fhe  relalive  widlh  of  fhe  indifference  regions. 
A  probabilistic  formula  V>g  [p]  is  verified  wifh  indifference  region  of  half-widlh  8  =  5q9  if  0  <  0.5  and 
5  =  (5o(l  —  6)  ofherwise.  The  verificafion  is  carried  oul  using  acceplance  sampling  as  before,  buf  wifh 
hypofheses  Hq  :  p  >  6  +  5  and  Hi  :  p  <  6  —  6.  Insfead  of  reporling  a  p-value,  we  reporl  bounds  for  fhe 
type  I  error  probability  of  fhe  sampling  plan  in  use  if  Hi  is  accepled  and  fhe  type  II  error  probability  if  Hq 
is  accepled.  The  type  I  error  of  a  sampling  plan  is  defined  as  fhe  maximum  probabilily  of  accepting  Hi 
when  Hq  holds,  while  fhe  type  II  error  is  defined  as  fhe  maximum  probabilily  of  accepting  Hq  when  Hi 
holds.  In  our  case,  assuming  a  sampling  plan  (n,  c)  is  used,  fhe  lype  I  error  is  F(c;  n,6  +  5)  and  fhe  type 

II  error  is  F{c;  n,6  —  6).  The  error  probabililies  can  be  used  in  fhe  same  way  as  p-values  lo  oblain  error 
probabilities  for  compound  slate  formulae.  A  palh  formula  can  be  Irealed  as  a  compound  slate  formula,  as 
suggested  by  Younes  and  Simmons  (2002),  which  allows  us  lo  derive  error  bounds  for  Ihe  verification  of 
palh  formulae  over  Irajeclories  as  well.  As  error  bounds  for  Ihe  compulation  of  Ihe  p-value  for  a  lop-level 
probabilistic  operator  we  simply  lake  Ihe  maximum  error  bounds  for  Ihe  verification  of  Ihe  palh  formula 
over  all  Irajeclories. 

5  Related  Work 

The  idea  of  using  slalislical  hypofhesis  testing  for  probabilistic  model  checking  of  “black-box”  systems  was 
recenlly  proposed  by  Sen  el  al.  (2004).  Their  work  is  Ihe  inspiration  for  Ihe  currenl  paper,  allhough  moslly 
for  Ihe  wrong  reasons.  Il  is  in  facl  Ihe  many  hidden  assumptions,  oulrighl  errors,  and  misleading  empirical 
evaluation  of  Sen  el  al.’s  presenlalion  lhal  has  prompted  our  inleresl  in  Ihe  subject 

Firsl,  consider  Ihe  verification  of  a  probabilistic  formula  V>  g  [p] .  Their  approach  is  essentially  Ihe 
same  as  ours:  given  a  conslanl  c,  accepl  if  Xi  >  c  and  rejecl  olherwise.  Their  choice  of  c  is  differenl, 
however,  and  is  essentially  based  on  De  Moivre’s  (1738)  normal  approximation  for  Ihe  binomial  dislribulion. 
Their  acceplance  condition  is  >  nO,  which  corresponds  to  choosing  c  to  be  \n6'\  —  1.  The  mean 

of  Ihe  binomial  dislribulion  B{n,  9)  is  n9,  so  Ibis  would  be  Ihe  righl  Ihing  to  do  if  assumed 

to  have  a  normal  dislribulion.  De  Moivre  showed  lhal  Ibis  is  approximately  Ihe  case  for  large  n  if  Xi 
are  Bernoulli  variates,  bul  Ihe  approximation  is  poor  for  moderate  values  of  n  or  if  0  is  nol  close  to  0.5. 
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Their  algorithm,  as  a  consequence,  will  under  some  circumstances  accept  a  hypothesis  with  a  larger  p- value 
than  the  alternative  hypothesis.  By  choosing  c  as  we  do,  without  relying  on  the  normal  approximation, 
we  guarantee  that  the  hypothesis  with  the  smallest  p-value  is  always  accepted  (Theorem  1).  Consider  the 
formula  'P>o.oi  example,  with  n  =  501  and  d  =  5.  Our  procedure  would  accept  the  formula  as 

true  with  p-value  0.562,  while  the  the  algorithm  of  Sen  et  al.  would  reject  the  formula  as  false  with  p-value 
0.614.  The  difference  is  not  of  great  significance,  but  it  is  still  worth  pointing  out  because  it  demonstrates  the 
danger  of  using  the  normal  approximation  for  the  binomial  distribution.  With  today’s  fast  digital  computers, 
it  is  hard  to  motivate  the  use  thereof.  Our  procedure  is  therefore  an  improvement  over  the  algorithm  of  Sen 
et  al. 

The  second  improvement  over  the  method  presented  by  Sen  et  al.  is  in  the  calculation  of  the  p- value  for 
the  verification  of  a  conjunction  <h  A  'h  when  both  conjuncts  have  been  verified  to  be  false.  They  state  that 
the  p-value  is  pv^  +  pv,^,  but  this  is  too  conservative.  There  is  no  reason  to  believe  that  the  confidence  in 
the  result  for  A  'h  would  be  lower  (i.e.  the  p-value  higher)  if  we  are  convinced  that  both  conjuncts  are 
false.  We  have  shown  that  the  p-value  in  this  case  is  bounded  by  min(p?;,j,,  p?;,j,),  which  intuitively  makes 
more  sense. 

Sen  et  al.’s  handling  of  nested  probabilistic  operators  is  just  plain  wrong.  They  confuse  the  p-value 
with  the  probability  of  accepting  a  false  hypothesis  (generally  referred  to  as  the  type  I  or  II  error  of  a 
sampling  plan).  The  p-value  is  not  a  bound  on  the  probability  of  a  certain  test  procedure  accepting  a  false 
hypothesis.  In  fact,  the  test  that  both  they  and  we  use  does  not  provide  a  useful  bound  on  the  probability  of 
accepting  a  false  hypothesis.  Their  analysis  relies  heavily  on  the  ability  to  bound  the  probability  of  accepting 
a  false  hypothesis,  so  it  breaks  down  completely.  We  have  proposed  a  way  to  cope  with  this  by  introducing 
indifference  regions  for  nested  probabilistic  operators. 

In  addition  to  getting  the  verification  of  nested  probabilistic  operators  wrong.  Sen  et  al.  are  very  vague 
regarding  the  assumptions  necessary  to  make  their  approach  produce  a  reliable  answer.  The  fact  that  they 
treat  any  portion  of  a  trajectory  starting  in  s,  regardless  of  the  portion  preceding  s,  as  a  sample  from  the  same 
distribution,  hides  a  rather  strong  assumption  regarding  the  dynamics  of  their  “black-box”  systems.  As  we 
have  pointed  out,  this  is  not  a  valid  assumption  unless  we  know  that  the  system  being  studied  is  a  Markov 
chain.  It  also  appears  as  if  they  only  consider  truncated  trajectories  over  which  they  can  fully  verify  a  path 
formula,  and  this  can  introduce  a  bias  that  very  well  may  invalidate  the  conclusion  they  reach  regarding 
the  truth-value  of  a  probabilistic  formula.  We  have  made  this  quite  clear  in  our  exposition,  and  we  have 
presented  a  sound  procedure  for  handling  the  fact  that  the  value  of  a  path  formula  may  not  be  determined 
over  all  truncated  trajectories  that  are  presented  to  us. 

Finally,  the  empirical  analysis  offered  by  Sen  et  al.  is  misleading.  They  give  the  reader  the  impression 
that  a  certain  p-value  can  be  guaranteed  for  a  verification  result  simply  by  increasing  the  sample  size.  This 
violates  the  premise  of  a  “black-box”  system  stated  by  the  authors  themselves  earlier  in  their  paper,  namely 
that  trajectories  cannot  be  generated  on  demand.  More  important,  though,  is  the  fact  that  a  certain  p-value 
never  can  be  guaranteed.  The  p-value  is  not  a  property  of  a  test,  but  simply  a  function  of  a  specific  set 
of  observations.  If  we  are  unlucky,  we  may  make  observations  that  give  us  a  large  p-value  even  in  cases 
when  this  is  unlikely.  It  is  therefore  misleading  to  say  that  their  algorithm  is  “faster”  than  the  statistical 
model  checking  algorithm  used  by  Younes  et  al.  (2004),  as  the  latter  algorithm  is  properly  designed  to 
realize  a  certain  performance  characteristic.  Their  empirical  results  can  in  fact  not  be  replicated  reliably 
because  there  is  no  fixed  procedure  by  which  they  can  determine  the  sample  size  required  to  achieve  a  certain 
accuracy.  Their  results  give  the  false  impression  that  their  procedure  is  sequential,  i.e.  that  the  sample  size 
automatically  adjusts  to  the  difficulty  of  attaining  a  certain  p-value,  when  in  reality  they  selected  the  reported 
sample  sizes  manually  based  on  prior  empirical  testing  (K.  Sen,  personal  communication.  May  20,  2004). 
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6  Discussion 


Sen  et  al.  (2004)  were  first  to  eonsider  the  problem  of  CSL  veriheation  for  “blaek-box”  systems.  We  have 
generalized  this  idea  to  a  wider  elass  of  probabilistie  systems  that  ean  be  eharaeterized  as  stoehastie  diserete 
event  systems.  Our  most  important  eontribution  is  to  have  given  a  elear  definition  of  what  eonstitutes  a 
“blaek-box”  system,  and  to  have  made  explieit  any  assumptions  making  feasible  the  applieation  of  statistieal 
hypothesis  testing  as  a  solution  teehnique  for  verifieation  of  sueh  systems.  We  have  extended  the  logie  PCTL 
to  enable  the  expression  of  properties  of  general  stoehastie  diserete  event  systems.  The  algorithm  we  have 
presented  for  verifying  PCTL  properties  of  “blaek-box”  systems  is  an  improvement  over  a  similar  but  flawed 
algorithm  proposed  by  Sen  et  al. 

The  algorithm  presented  in  this  paper  should  not  be  thought  of  as  an  alternative  to  the  statistieal  model 
eheeking  algorithm  proposed  by  Younes  and  Simmons  (2002)  and  empirieally  evaluated  by  Younes  et  al. 
(2004).  The  two  algorithms  are  eomplementary  rather  than  eompeting,  and  are  useful  under  disparate  sets  of 
assumptions.  If  we  eannot  generate  trajeetories  for  a  system  on  demand,  then  the  algorithm  presented  here 
allows  us  to  still  reaeh  eonelusions  regarding  the  behavior  of  the  system.  If,  however,  we  know  the  dynamies 
of  a  system  well  enough  to  enabled  simulation,  then  we  are  better  off  with  the  alternative  approaeh  as  it  gives 
full  eontrol  over  the  probability  of  obtaining  an  ineorreet  result. 
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